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Belief  networks  (also  known  as  Bayesian  networks,  causal  networks,  or  probabilistic  networks) 
represent  dependencies  between  variables  and  give  a  concise  specification  of  a  joint  probability 
distribution.  They  enable  a  general-purpose  inference  method  that  can  answer  a  broad  class  o 
queries  given  information  that  ts  uncertain  or  incomplete.  In  this  research  project,  we  iave 
investigated  methods  and  implemented  algorithms  tor  efficiently  making  certain  c  asses  o 
inference  in  belief  networks,  and  for  automatically  learning  certain  classes  ol  belief  networks  o 
make  more  accurate  inferences 


The  progress  on  this  project,  falls  into  iwo  rela'cd  are  is 
•  Inference 

In  each  case,  progress  has  been  both  on  under  standing  and  unifying  existing  approaches  and  the 
development  of  new  methods 

Inference  in  Belief  Networks  ,  , 

In  this  research,  we  recently  demonstrated  that  many  algorithms  for  probabilistic  inference,  such  as 
belief  updating,  finding  the  most  probable  explanation,  finding  the  maximum  posteriori  hypothesis 
and  the  maximum  expected  utilitv,  can  also  be  expressed  as  bucket-elimination  algorithms.  Bucket 
elimination  is  a  unifying  algorithmic  framework  that  generalizes  dynamic  programming  to 
accommodate  many  complex  problem  solving  and  reasoning  activities.  Algorithms  such  as 
directional-resolution  for  propositional  satisfiability,  adaptive-consistency  for  constraint 
satisfaction,  Fourier  and  Gaussian  elimination,  for  linear  equalities  and  inequalities,  and  dynamic 
programming  for  combinatorial  optimization,  an  he  all  accommodated  within  this  framework. 

The  main  virtues  of  this  framework,  are  simplicity  and  generality.  All  bucket-elimination 
algorithms  are  sufficiently  similar  so  that  any  improvement  to  a  single  algorithm  is  therefore 
applicable  to  all  others  in  that  class.  For  example,  by  expressing  probabilistic  inference  algorithms 
as  bucket-elimination  methods,  their  relationship  to  dynamic  programming  and  to  constraint 
satisfaction  methods  becomes  perspicuous  an  I  allows  the  knowledge  accumulated  in  those  areas  to 
be  utilized.  In  summary,  bucket -elimination  provides  a  unified  framework  for  the  expression  of 
fundamental  algorithms  in  a  diverse  class  of  fields:  rather  than  “reinventing  the  wheel  the 
framework  allows  exploiting  and  transferring  ideas.  For  example,  complexity  bounds  that  are 
derived  from  one  area  (e.g..  constraint  networks)  can  apply  to  other  areas  (c.g.,  belief  networks) 
when  both  are  viewed  special  cases  of  bucket  binm  ition. 

The  key  results  on  inference  in  belief  network  <  have  been  published  in  papers  presented  at  the 
Uncertainly  in  Artificial  Intelligence  and  the  htemaiomi!  Joint  Conference  of  Artificial 
Intelligence.  These  papers  are  summarized  be  1  u 

Dechter,  R.  (1996).  Bucket  Elimination:  A  Unifying  Framework  for  Probabilistic  Inference. 
Uncertainty  in  Artificial  Intelliyt m  e,  UAI96  -’opiat'd.  Oregon,  pp.  220-227. 

Probabilistic  inference  algorithms  for  wlief  updating,  finding  the  most  probable 
explanation,  the  maximum  a  posterior  hypothesis,  and  the  maximum  expected  utility  were 
reformulated  within  the  bucket  elimin  ition  hamework.  This  emphasized  the  principles 
common  to  many  of  the  algorithms  sip.var- -v  in  the  probabilistic  inference  literature  and 
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clarified  the  relationship  of  such  algorithms  to  nonserial  dynamic  programming  algorithms. 
A  general  method  for  combining  conditioning  and  bucket  elimination  was  developed. 

Dechter,  R.  (1996).  Topological  parameters  foi  time  space  tradeoff.  Uncertainty  in  Artificial 
Intelligence,  UAI96.  Portland,  Oregon,  pp.  220  22' 

Wc  proposed  a  family  of  algorithms  combining  tree-clustering  with  conditioning  that  trade 
space  for  time.  Such  algorithms  are  useful  tor  reasoning  in  probabilistic  and  deterministic 
settings.  By  analyzing  the  problem  structure,  we  showed  that  it  is  be  possible  to  select  from 
a  spectrum  the  algorithm  that  best  mee's  a  ui'en  time-space  specification. 

El  Fattah,  Yousri  and  Dechter.  Rina  (1996).  An  Evaluation  of  Structural  Parameters  for 
Probabilistic  Reasoning:  Results  on  Benchmaik  C  i  c  ut  Uncertainty  in  Artificial  Intelligence, 
UAI96.  Portland,  Oregon,  Augus  pp.  220-22’ 

We  studied  the  potential  of  structure- based  algorithms  in  real-life  applications.  Many 
algorithms  for  processing  probabilistic  networks  are  dependent  on  the  topological  properties 
of  the  problem’s  structure.  Such  algorithms  te  g.,  clustering,  and  conditioning)  are  effective 
only  if  the  problem  has  a  sparse  graph  c  aptured  by  parameters  such  as  tree  width  and  cycle- 
cutset  size.  We  analyzed  empirically  die  structural  properties  of  problems  coming  from  the 
circuit  diagnosis  domain  Specifically  ve  located  those  properties  that  capture  the 
effectiveness  of  clustering  and  oondiiioning  as  well  as  of  a  family  ot 
conditioning+clustering  algorithms  designed  to  gradually  trade  space  for  time.  We 
performed  our  analysis  on  1 !  benchmark  circuits  widely  used  in  the  testing  community. 
We  investigated  the  effect  of  ordering  heuristics  on  tree-clustering  and  showed  that,  on  our 
benchmarks,  the  well-known  max-i  irdmabiy  ordering  is  substantially  inferior  to  an 
ordering  called  min-degrev 

Dechter,  R.  (1997)  Mini-Buckeis:  A  general  s ••lieme  for  generating  approximations  in  Automated 
reasoning.  In  Proceedings  of  r/<  Fifteenth  Ini  ‘‘tt, an  nai  Joint  l conference  of  Artificial  Intelligence 
(IJCAI97),  Japan,  1997. 

A  class  of  algorithms  for  approximating  reasoning  tasks  was  developed  based  on 
approximating  the  general  bucket  elimination  framework.  The  algorithms  have  levels  of 
accuracy  and  efficiency  and  can  be  applied  uniformly  across  many  areas  and  problem 
tasks.  We  introduced  these  algorithm'  r.  the  context  ol  combinatorial  optimization  and 
probabilistic  inference 

Dechter,  R.,  and  Rish,  I.,  (1997 1.  A  scheme  for  approximating  probabilistic  inference.  In 
Uncertainty  in  Artificial  Intelligence  (UAI97)  Providence.  Rhode  Island 

A  class  of  probabilistic  approximation  algorithm,  based  on  bucket-elimination  were 
developed  offering  adjustable  levels  o  accuracy  and  efficiency.  We  analyzed  the 
approximation  for  several  tasks:  beliel  updating,  finding  the  most  probable  explanation,  and 
finding  the  maximum  a  posteriori  hypothesis  We  identified  regions  of  completeness  and 
provided  empirical  evaluations  on  randomly  ten  crated  networks 
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Learning  in  Belief  Networks 

Our  research  in  learning  of  belief  networks  ha>  f  ocused  on  two  issues.  First,  we  have  investigated  a 
special  case  of  the  Bayesian  network  known  as  the  first-order  Bayesian  Classifier.  This  classifier 
assumes  that  variables  arc  conditionally  independent  given  the  class  variable.  Our  theoretical  work 
has  investigated  why  the  classifier  performs  wel  in  practice  even  though  the  independence 
assumption  is  violated.  The  research  revealed  two  reasons.  First,  we  showed  that  the  violations  of 
the  independence  assumption  has  less  of  an  effect  on  finding  the  maximum  a  posteriori  hypothesis 
than  it  does  on  probability  estimation.  That  is.  although  the  independence  assumption  affects 
probability  estimation,  this  docs  not  affect  the  classification  outcome.  We  went  on  to  identify 
several  concepts  classes  for  which  the  Bayesian  Classifier  is  optimal  although  the  independence 
assumption  is  violated.  An  important  implication  of  this  finding  is  that  the  largest  violations  of  the 
independence  assumption  do  not  necessarily  hav«-  the  largest  effect  on  the  accuracy  of  the 
inferences.  Second,  we  showed  that  there  in  a  trade-off  between  errors  caused  by  incorrectly 
assuming  independence  and  errors  caused  by  estimating  joint  probabilities.  We  used  this  finding  to 
develop  a  learning  algorithm  that  creates  aBuyesian  Network  by  introducing  edges  to  con-ect  for 
the  most  serious  violations  of  the  independem  e  ascription.  Finally,  we  investigated  the  use  of  the 
Bayesian  classifier  in  learning  user  models. 

The  three  publications  described  helow  illustrate  the  published  results, 

Domingos,  P.,  &  Pazzani,  M.  i  in  press).  Beyond  Independence:  Conditions  for  the  Optimality  of 
the  Simple  Bayesian  Classifier  Machine  Lea  mm 

The  simple  Bayesian  classifier  is  known  u  be  optimal  when  attributes  are  independent 
given  the  class,  but  the  question  of  whether  other  sufficient  conditions  for  its  optimality 
exist  had  not  been  explored  Empirical  results  showing  that  it  performs  surprisingly  well  in 
many  domains  containing  clear  attribute  dependencies  suggested  that  the  answer  to  this 
question  may  be  positive  In  this  research  we  show  that,  although  the  Bayesian  classifiers 
probability  estimates  are  only  optimal  under  quadratic  loss  if  the  independence  assumption 
holds,  the  classifier  itself  can  be  optimal  under  zero-one  loss  (misclassification  rate)  even 
when  this  assumption  is  violated  by  it  wide  margin.  The  region  of  quadratic-loss  optimality 
of  the  Bayesian  classifier  is  in  fact  a  second-order  infinitesimal  fraction  of  the  region  of 
zero-one  optimalitv  This  implies  thin  the  Bayesian  classifier  has  a  much  greater  range  of 
applicability  than  previously  thought  Fm  t  xample,  we  have  shown  it  to  be  theoretically 
optimal  for  learning  '.(injunctions  and  uisjunctions.  even  though  they  violate  the 
independence  assumpt  • 

Pazzani,  M.  (1997).  Searching  for  dependencies  m  Bayesian  classifiers.  Artificial  Intelligence  and 
Statistics  IV,  Lecture  Notes  in  Statistics,  Springer  Vtrlag:  New  York 

Naive  Bayesian  classifiers  which  m  ike  independence  assumptions,  perform  remarkably 
well  on  some  data  sets  but  poorly  on  others  We  explored  ways  to  improve  the  Bayesian 
classifier  by  searching  tor  dependencies  among  attributes.  We  proposed  and  evaluated  two 
algorithms  for  detect" ■«  dependent-  --  among  attributes  and  show  that  the  backward 
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sequential  elimination  and  joining  algorithm  provides  the  most  improvement  over  the  naive 
Bayesian  classifier.  The  domains  on  which  the  most  improvement  occurs  are  those  domains 
on  which  the  naive  Bayesian  classifier  is  significantly  less  accurate  than  a  decision  tree 
learner.  This  suggests  that  the  attributes  used  in  some  common  databases  are  not 
independent  conditioned  on  the  class  and  that  the  violations  of  the  independence  assumption 
that  affect  the  accuracy  of  the  classify  can  be  detected  from  training  data. 

Billsus,  Daniel  &  Pazzani,  M  f  1997).  Learning  Probabilistic  User  Models.  In  Workshop  Notes  of 
"Machine  Learning  for  User  Modeling".  Sixh  Imemational  Conference  on  User  Modeling,  Chia 
Laguna,  Sardinia. 

We  described  two  applications  that  use  rated  text  documents  to  induce  a  model  of  the  user’s 
interests.  We  discuss  the  advantages  and  disadvantages  of  the  Bayesian  classifier  and 
present  a  novel  extension  to  this  algorithm  that  is  specifically  geared  towards  improving 
predictive  accuracy  for  datasets  typu  ally  encountered  in  user  modeling  and  information 
filtering  tasks 

Tn  addition  to  these  publications,  work  is  underway  on  augmenting  the  Bayesian  classifier  with  a 
tree  representation  of  dependencies.  Unlike  earlier  work1,  we  build  the  tree-representation  of  the 
probability  distribution  to  maximize  predictive  aciuracv.  The  earlier  work  builds  the  tree  that  best 
approximates  the  probability  distribution.  Results  on  10  commonly  used  benchmark  problems 
show  an  improvement  over  'he  earlier  wor  bv  taking  the  specific  nature  of  the  classification 
problem  into  account. 

Proposed  Future  Work. 

The  original  proposal  was  for  an  ambitious  three-year  project  on  inference  and  learning  in  Bayesian 
networks.  An  eighteen-month  project  was  funded.  Wc  have  completed  approximately  half  of  the 
goals  of  the  original  proposal.  Hero,  we  outline  the  i e 'naming  work. 

1.  .Stochastic  greedy  methods  lor  inference  also  tailed  local  repair  algorithms).  The  majority  of 
our  efforts  to  date  have  Incused  on  ippmxsmate  dynamic  programming  algorithms  for 
inference.  Given  the  success  of  local  repair  algorithms  on  constraint  networks,  and  the 
relationship  between  constraint  networks  and  belie)  networks,  we  will  further  investigate  local 
repair  algorithms  for  belief  networks,  Ou  goal  is  to  have  approximation  algorithms  that, 

•  Return  an  optimal  solution  in  a  large  f  action  of  cases,  especially  for  those  problems  that  are 
known  to  be  tractable 

•  Have  an  average  performance  substam  ially  better  then  any  of  their  complete  counterparts 

•  Have  a  minimal  deviation  from  optimality  solutions 

For  more  detailed  information,  see  sectior  s.3  and  section  6. 1  of  the  proposal. 

2.  Learning  unrestricted  Bayesian  network-  Our  current  work  has  focused  on  learning  two 
special  cases  Bayesian  networks.  The  undying  theme  behind  these  two  approaches  was  the  use 
of  task  specific  information  (i  <?.,  the  class  variable)  to  bias  the  constiuction  of  the  classifier  In 
the  next  eighteen  months  we  propose  u  investigate  a  similar  approach  to  learning  Bayesian 
networks  that  best  approximate  a  probability  distribution  for  a  given  task  Furthermore,  we  will 
investigate  approaches  for  rev  ising  expert  netw  o  k s 

1  Friedman  and  M.  Gotdszmidt  -  I'W  ’  ^uddinjj  <  ’tassr  •»  a  m  •  Bayesian  Networks  AAAIV6 
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For  more,  detail,  see  section  ?  I  of  the  orig  nal  proposal 

3  Integration  of  inference  am)  learning.  In  the.  first  half  ol  this  work,  progress  was  made 
independently  on  the  two  problems  of  inference  and  learning.  There  is  the  potential  of 
synergistically  combining  these  two  research  issues  In  particular,  the  learning  system  makes 
use  of  the  exact  algorithms  lor  finding  he  most  plausible  explanation.  We  anticipate  that 
learning  unrestricted  Bayesian  Networks  v  ill  require  the  use  of  approximate  inference  methods. 
This  is  elaborated  in  section  I  of  the  previ.  us  proposal. 

Finding  the  most  probable  explanation  in  Beliel  Networks  is  an  important  task  that  appears  in  many 
applications  for  diagnosis  and  abduction.  We  have  made  considerable  progress  on  understanding 
this  inference  task  in  Belief  Networks  and  developing  methods  based  on  approximate  dynamic 
programming.  We  propose  to  make  further  progress  and  to  explore  local  repair  algorithms. 
Similarly,  we  have  made  progress  on  learning  restricted  classes  of  Beliel  Networks  and  propose  to 
expand  the  class  of  networks  that  can  be  elfin  mtH  !■  arned  from  data. 
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